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Error correction vs. q uery g arblin g for Arabic OCR document retrieval 
Kareem Darwish, Walid Magdy 

November 2007 ACM Transactions on Information Systems (TOIS), 

Volume 26 Issue 1 

Publisher: ACM 

Additional Information: full citation , abstract . 

references , index terms 



Full text available: ^ pdf( 585.25 KB) 



Due to the existence of large numbers of legacy documents (such as old 
books and newspapers), improving retrieval effectiveness for OCR'ed 
documents continues to be an important problem. This article compares 
the effect of OCR error correction with and ... 

Keywords: Arabic Retrieval, OCR Correction, OCR Retrieval 



2 Error correction in a Chinese OCR test collection 
Yuen-Hsien Tseng 

August 2002 SIGIR '02: Proceedings of the 25th annual international ACM 

SIGIR conference on Research and development in information 
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Full text available: *g] pdf d 26.00 KB ) 



This article proposes a technique for correcting Chinese OCR errors to 
support retrieval of scanned documents. The technique uses a 
completely automatic technique (no manually constructed lexicons or 
confusion resources) to identify both keywords and ... 



Ads by Google 



Market Track. LLC 
The leader in retail 
ad tracking, 
analysis and digital 
ad achieves 

www.markettrack.com 



Image Analysis 
Software 

Easy to use image 
analysis software . 
for microscopy. 

www.clemex.com 



Document 
Scanning Service 

Free Online Quote. 
Scan to PDF/TIF 
Serving the DC 
Metropolitan Area 

www.ignitedscanning.com 




Keywords: Chinese, confusing pair, error correction, term clustering 



A filter based post-OCR accuracy boost system 
Eugene Borovikov, Ilya Zavorin, Mark Turner 
November 2004 HDP '04: Proceedings of the 1st ACM workshop on 

Hardcopy document processing 

Publisher: ACM 

Full text available: « pdfd 26.24 KB) Additional '^"™«on: fu " citation ' 

L - J references , index terms 

Our current research effort aims at building a filter based post-OCR 
accuracy boost system that will combine different post-OCR correction 
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filters to improve the OCR accuracy better than each individual filter can. 
In this paper we focus on a Hidden ... 

Keywords: HMM, OCR, error correction, information retrieval, machine 
translation, statistical language models 





Information access in the presence of OCR errors 

Kazem Taghva, Thomas Nartker, Julie Borsack 
November 2004 HDP '04: Proceedings of the 1st ACM workshop on 

Hardcopy document processing 

Publisher: ACM 

i- ii * * ■■ ui ts* Jf /on C n ixd\ Additional Information: full citation , abstract . 
Full text available: TO pdf d 39.50 KB ) — — 7^ 

^ references , index terms 

Over the last 15 years, the Information Science Research Institute 
(ISRI) at the University of Nevada, Las Vegas (UNLV) has conducted 
information access research in the presence of OCR errors. Our research 
has focused on issues associated with the construction ... 

Keywords: categorization, document conversion, information 
extraction, markup 



Ada ptive Hindi OCR usin g g eneralized Hausdorff ima g e comparison 
Huanfeng Ma, David Doermann 

September 2003 ACM Transactions on Asian Language Information 

Processing (TALIP), volume 2 issue 3 

Publisher: ACM 

Additional Information: full citation , abstract . 



Full text available: TO pdf (280.45 KB) 

l£=s ~ references , index terms 

We present an adaptive Hindi OCR implemented as part of a rapidly 
retargetable language tool effort. The system includes: script 
identification, character segmentation, training sample creation, and 
character recognition. In script identification, Hindi ... 

Keywords: Optical character recognition (OCR), document processing, 
generalized Hausdorff image comparison, script identification 




6 Securin g passwords a g ainst dictionary attacks 

Benny Pinkas, Tomas Sander 

November 2002 CCS '02: Proceedings of the 9th ACM conference on 

Computer and communications security 

Publisher: ACM 

Additional Information: full citation , abstract . 
Full text available: *g[_pdf (216.72 KB ), references , cited by . index 

terms 

The use of passwords is a major point of vulnerability in computer 
security, as passwords are often easy to guess by automated programs 
running dictionary attacks. Passwords remain the most widely used 
authentication method despite their well-known security ... 



7 Fast Ap proximate Search in Lar g e Dictionaries 
Stoyan Mihov, Klaus U. Schulz 

December 2004 Computational Linguistics, volume 30 issue 4 
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Publisher: MIT Press 

Full text available: ■g |pdft509.51 KB) Additional lnforma « on: M citation , abstract, 

^ references , index terms 

The need to correct garbled strings arises in many areas of natural 
language processing. If a dictionary is available that covers all possible 
input tokens, a natural set of candidates for correcting an erroneous 
input P is the set of all words ... 




8 Joint cate g orization of queries and clips for web-based video search 

Ruofei Zhang, Ramesh Sarukkai, Jyh-Herng Chow, Wei Dai, Zhongfei Zhang 
October 2006 MIR '06: Proceedings of the 8th ACM international workshop 

on Multimedia information retrieval 
Publisher: ACM 

Full text available: « p_df(37876KB) Additional ^rmaMon: fu It citation, abstract, 

Ia ~ references , index terms 

Building a video search engine on the Web is a very challenging 
problem. Compared with web page search, video search has its unique 
characteristics (such as high volume of data for each video, existence of 
multi-modal information including meta-data, ... 

Keywords: experiment, multi-modality based categorization, query 
categorization, video categorization, web-based video search 



9 Multi-modal information retrieval from broadcast video usin g OCR 
and speech reco gnition 

^ Alexander G. Hauptmann, Rong Jin, Tobun Dorbin Ng 

July 2002 JCDL '02: Proceedings of the 2nd ACM/IEEE-CS joint conference 

on Digital libraries 
Publisher: ACM 

Additional Information: full citation , abstract . 
Full text available: ^ pdf (205.98 KB) references , cited b y. index 

terms 

We examine multi-modal information retrieval from broadcast video 
where text can be read on the screen through OCR and speech 
recognition can be performed on the audio track. OCR and speech 
recognition are compared on the 2001 TREC Video Retrieval 
evaluation ... 

Keywords: multi-modal video information retrieval, optical character 
recognition OCR, speech recognition 



10 A g enerative probabilistic OCR model for NLP applications 

Okan Kolak, William Byrne, Philip Resnik 

May 2003 NAACL '03: Proceedings of the 2003 Conference of the 
North American Chapter of the Association for 
Computational Linguistics on Human Language Technology 

- Volume 1, Volume i 

Publisher: Association for Computational Linguistics 

r- „ . ■■ u. 0i ^,aaa oo i/"d\ Additional Information: full citation , abstract . 4 
Full text available: TO pdf(144.33 KB ) — ~ ^ u • 

^ references , cited by 

In this paper, we introduce a generative probabilistic optical character 
recognition (OCR) model that describes an end-to-end process in the 
noisy channel framework, progressing from generation of true text 
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through its transformation into the noisy output ... 



11 Information retrieval and OCR: from convertin g content to gras ping 
^ meanin g 

^ Jamie Callan, Paul Kantor, David Grossman 

September 2002 ACM SIGIR Forum, volume 36 issue 2 
Publisher: ACM 

Full text available: ffl pdf(33.44 KB) Additional Information: full citation , abstract 



IR and OCR have largely developed independent standards and metrics, 
with OCR focused on literal accuracy, and IR focused on essential 
"content/meaning". With more and more media not only paper, but in 
multiple image formats, the opportunities and challenges ... 




12 Adaptive text correction with Web-crawled domain-dependent 
dictionaries 

Christoph Ringlstetter, Klaus U. Schulz, Stoyan Mihov 
October 2007 ACM Transactions on Speech and Language Processing 

(TSLP), Volume 4 Issue 4 
Publisher: ACM 

Additional Information: full citation , abstract . 



Full text available: 1H pdf(390.15 KB ) 

references , index terms 

For the success of lexical text correction, high coverage of the underlying 
background dictionary is crucial. Still, most correction tools are built on 
top of static dictionaries that represent fixed collections of expressions of 
a given language. When ... 

Keywords: Adaptive techniques, Web crawling, dictionaries, domains, 
error correction 




13 Computer prog rams for detectin g and correctin g s pellin g errors 

James L. Peterson 

December 1980 Communications of the ACM, volume 23 issue 12 
Publisher: ACM 

i_ . , t A Additional Information: full citation , abstract , references . 

Full text available: TO pdf(1 . 25 MB ) ~~Z 

^ ' cited by 

With the increase in word and text processing computer systems, 
programs which check and correct spelling will become more and more 
common. Peterson investigates the basic structure of several such 
existing programs and their approaches to solving the ... 

Keywords: spelling, spelling correction, spelling dictionary, spelling 
programs 



14 D ynamic spatial approximation trees 

^ Gonzalo Navarro, Nora Reyes 

N" 7 August 2007 Journal of Experimental Algorithmics (JEA), volume 12 
Publisher: ACM 

Cll . , .. ■■„, Additional Information: full citation , abstract , references . 

Full text available: TO pdf(2.33 MB) ~ — " 

index terms 

Metric space searching is an emerging technique to address the problem 

* 
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of efficient similarity searching in many applications, including 
multimedia databases and other repositories handling complex objects. 
Although promising, the metric space approach ... 

Keywords: Multimedia databases, similarity or proximity search, spatial 
and multidimensional search, spatial approximation tree 



15 A new generation of textual corpora: mining corpora from very large 
^ collections 

Gordon Stewart, Gregory Crane, Alison Babeu 

June 2007 JCDL '07: Proceedings of the 2007 conference on Digital libraries 
Publisher: ACM 

Additional Information: full citation , abstrac t. 



Full text available: fju pdf(253.35 KB ) 

l£J " references, in dex terms 

While digital libraries based on page images and automatically generated 
text have made possible massive projects such as the Million Book 
Library, Open Content Alliance, Google, and others, humanists still 
depend upon textual corpora expensively produced ... 

Keywords: OCR evaluation, ancient greek, text alignment 



16 A hash code method for detectin g and correctin g s pellin g errors 

M. Mor, A. S. Fraenkel 

December 1982 Communications of the ACM, volume 25 issue 12 
Publisher: ACM 

Additional Information: full citation , abstract. 
Full text available: *g|,adi( 458.71 KB ), references , cited b y. index 

terms 

The most common spelling errors are one extra letter, one missing 
letter, one wrong letter, or the transposition of two letters. Deletion, 
exchange, and rotation operators are defined which detect and "mend" 
such spelling errors and thus ... 

Keywords: deletion, dictionary, exchange, rotation, spelling, spelling 
errors 




17 Probabilistic structured query methods 

^ Kareem Darwish, Douglas W. Oard 

July 2003 SIGIR '03: Proceedings of the 26th annual international ACM 
SIGIR conference on Research and development in informaion 
retrieval 
Publisher: ACM 

Additional Information: full citation , abstract , 
Full text available: *g|j3df( 301.72 KB) , references , cited b y. index 

terms . 

Structured methods for query term replacement rely on separate 
estimates of term tes of replacement probabilities. Statistically 
significantfrequency and document frequency to compute a weight for 
each query term. This paper reviews prior work on structured ... 

Keywords: CLIR, OCR, arabic, structured queries, term replacement 
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Henry S. Baird, Daniel Lopresti, Brian D. Davison, William M. Pottenger 
November 2004 HDP '04: Proceedings of the 1st ACM workshop on 

Hardcopy document processing 

Publisher: ACM 

Additional Information: full citation , abstract . 
Full text available: ^ pdf(92.22 KB) references , cited b y. index 

terms 

No existing document image understanding technology, whether 
experimental or commercially available, can guarantee high accuracy 
across the full range of documents of interest to industrial and 
government agency users. Ideally, users should be able to ... 

Keywords: OCR error management, document analysis, information 
retrieval 




19 Technique fo r automaticall y correctin g words in text 
Karen Kukich 

December 1992 ACM Computing Surveys (CSUR), volume 24 issue 4 
Publisher: ACM 

r- „ 4 , 0 ASfC oo iv/ids Additional Information: full citation , abstract , references . 

Full text available: ju pdf( 6.23 MB ) ~ : 

^ cited b y. index terms , review 

Research aimed at correcting words in text has focused on three 
progressively more difficult problems:(l) nonword error detection; (2) 
isolated-word error correction; and (3) context-dependent work 
correction. In response to the first problem, efficient ... 

Keywords: n-gram analysis, Optical Character Recognition (OCR), 
context-dependent spelling correction, grammar checking, natural- 
language-processing models, neural net classifiers, spell checking, 
spelling error detection, spelling error patterns, statistical-language 
models, word recognition and correction 



20 The lifecycle of a di g ital historical document: structure and content 

A. Antonacopoulos, D. Karatzas, H. Krawczyk, B. Wiszniewski 
October 2004 DocEng '04: Proceedings of the 2004 ACM symposium on 

Document engineering 
Publisher: ACM 

Additional Information: full citation , abstract . 
Full text available: *Qpdf( 313.24 KB ) references, cited b y. index 

terms , review 

This paper describes the lifecycle of a digital historical document, from 
template-based structure definition through to content extraction from 
the scanned pages and its final reconstitution as an electronic document 
(combining content and semantic ... 

Keywords: digital libraries, document analysis, document architecture, 
document engineering, historical documents, text enhancement 




Results 1 - 20 of 96 Result page: 1 2 3 4 5 next >l> 

The ACM Portal is published by the Association for Computing Machinery. Copyright © 2008 ACM, Inc. 

Terms of Usag e Privacy Policy Code of Ethics Contact Us 

Useful downloads: t§ Adobe Acrobat Q QuickTime H Windows Media Pla yer ^> Real Pla yer 



http://portal.acm.org/resultsx 1/18/2008 



Results (page 1): search documents and ocr and translate keywords Page 1 of 6 

Subscribe (Full Service) Register (Limited Service, Free) Login 

Search: <5 The ACM Digital Library C The Guide 

USPTO 



8 PORTAL 



search documents and ocr and translate keywords 




I Feedback 



search documents and ocr and translate keywords 

Terms used: search documents ocr translate keywords 



Found 23 of 238,048 



Sort results | re |evance PI ^ Save results to a Binder Refine these results with Advanced 

by I — —J Search 



Display 
results 



expanded form IF] □ Open results in a new 
' " window 



Try this search in The ACM Guide 



Results 1 - 20 of 23 



Result page: 1 2 next >> 




Document di g itization lifecycle for complex magazine collection 

Sherif Yacoub, John Burns, Paolo Faraboschi, Daniel Ortega, Jose Abad 
Peiro, Vinay Saxena 

November 2005 DocEng '05: Proceedings of the 2005 ACM symposium on 

Document engineering 

Publisher: ACM 

Additional Information: full citation , abstract . 
Full text available: < g)pdf(54Q 79 KB ) references , cited b y, index 

terms 

The conversion of large collections of documents from paper to digital 
formats that are suitable for electronic archival is a complex multi-phase 
process. The creation of good quality images from paper documents is 
just one phase. To extract relevant ... 

Keywords: document analysis and understanding, document 
digitization, document engineering, preservation of historical content 



Ads by Google 



Market Track. LLC 

The leader in retail 
ad tracking, 
analysis and digital 
ad achieves 

www.markettrack.com 



Ima g e Analysis 
Software 

Easy to use image 
analysis software 
for microscopy. 

www.clemex.com 





Ecli pse modelin g framework for document mana g ement 

Neil Boyette, Vikas Krishna, Savitha Srinivasan 

November 2005 DocEng '05: Proceedings of the 2005 ACM symposium on 

Document engineering 

Publisher: ACM 

,- „ , t ■, f , .a* jr/Hon i/nv Additional Information: full citation , abstra ct, 

Full text available: TO pdf{ 1 80.43 KB) — 

^ re f erences mdex terms 

The lifecycle of document management applications typically comprises a 
set of loosely coupled subsystems that provide capture, index, search, 
workflow, fulfillment and archival features. However, there exists no 
standard model for composing these elements ... 

Keywords: DMS, EMF, IT, ROI, business transformation, documents, 
eclipse, framework, logistics, modeling, process 



Extractin g mathematical expressions from postscri pt documents 
Michael Yang, Richard Fateman 

July 2004 ISSAC '04: Proceedings of the 2004 international symposium on 

Symbolic and algebraic computation 
Publisher: ACM 

Full text available: ^odff 137.43 KB) Additional Information: full citation , abstract , 



Document 
Scanning Service 
Free Online Quote. 
Scan to PDF/TIF 
Serving the DC 
Metropolitan Area 

www.ignitedscanning.com 



Image Analysis 
Techniques 
Unique Software 
Solutions That 
Work Affordable & 
Custom Made. Buy 
Now! 

www. SmartimTech.com/ln 



http://portal.acm.org/results.cfin?coll=ACM&dl=ACM&CFID=50523991&CFTOKEN=l ... 1/1 8/2008 



» 



Results (page 1): search documents and ocr and translate keywords 



Page 2 of 6 



references , index terms 

Full-text indexing of documents containing mathematics cannot be 
considered a complete success unless the mathematics symbolism is 
extracted and represented in a standardized form permitting both 
searching for formulas, and re-use of this information ... 

Keywords: digital library, document image analysis, mathematics, 
optical character recognition, postscript documents 



4 Joint visual-text modelin g for automatic retrieval of multimedia 
^ documents 

^ G. Iyengar, P. Duygulu, S. Feng, P. Ircing, S. P. Khudanpur, D. Klakow, M. 
R. Krause, R. Manmatha, H. J. Nock, D. Petkova, B. Pytlik, P. Virga 
November 2005 MULTIMEDIA '05: Proceedings of the 13th annual ACM 

international conference on Multimedia 

Publisher: ACM 

Additional Information: ful l citati on, abstract. 
Full text available: Qpdf(43_5J7.KB) r eferences , cited by, index 

terms 

In this paper we describe a novel approach for jointly modeling the text 
and the visual components of multimedia documents for the purpose of 
information retrieval(IR). We propose a novel framework where 
individual components are developed to model different ... 



Keywords: TRECVID, joint visual-text models, multimedia retrieval 
models, retrieval models 




Scoring mis sing ter ms in inf or mation retrieval tasks 

Egidio Terra, Charles L.A. Clarke 

November 2004 CIKM '04: Proceedings of the thirteenth ACM international 

conference on Information and knowledge management 

Publisher: ACM 

Additional Information: full citation , abstract . 
Full text available: i g|pdf( 286.12 KB ) references , cited by, index 

terms 

An usual approach to address mismatching vocabulary problem is to 
augment the original query using dictionaries and other lexical resources 
and/or by looking at pseudo-relevant documents. Either way, terms are 
added to form a new query that will be used ... 



Keywords: automatic query expansion, document retrieval, passage 
retrieval 



6 The multivalent browser: a platform for new ideas 

Thomas A. Phelps, Robert Wilensky 

November 2001 DocEng '01: Proceedings of the 2001 ACM Symposium on 

Document engineering 

Publisher: ACM 

Additional Information: full citation , abstract , 
Full text available: gjpdfd 88.51 KB ) references , cited b y, index 

terms 

The Multivalent Browser is built on a architecture that separates 
functionality from concrete document format. Almost all functionality is 
made available via relatively small modules of code called behaviors that 
programmers can write to extend the core ... 




http://portal.acm.org/resultsxfm?coll=ACM&dl=ACM&CFID=505 1/18/2008 



Results (page 1): search documents and ocr and translate keywords Page 3 of 6 

Keywords: annotation, architecture, digital, document, multivalent 
behavior, paper, scanned 



7 The digital atheneum: new a p proaches for preserving, restorin g and 
^ analyzin g damaged manuscripts 

^ Michael S. Brown, W. Brent 

January 2001 JCDL '01: Proceedings of the 1st ACM/IEEE-CS joint 

conference on Digital libraries 
Publisher: ACM 

_ ,. . . , « on K/IDX Additional Information: full citation , abstract , references . 

Full text available: IS pdf(2.30 MB) — — ' — ■ 

^ cited by . index terms 

This paper presents research focused on developing new techniques and 
algorithms for the digital acquisition, restoration, and study of damaged 
manuscripts. We present results from an acquisition effort in partnership 
with the British Library, funded ... 

Keywords: digital libraries, digital preservation, document analysis, 
humanities computing, restoration 



8 Task-based interaction with an inte g rated multilin g ual multimedia 
^ information system: a formative evaluation 

^ Pengyi Zhang, Lynne Plettenberg, Judith L. Klavans, Douglas W. Oard, 
Dagobert Soergel 

June 2007 JCDL '07: Proceedings of the 2007 conference on Digital libraries 
Publisher: ACM 

t- ii . ^ ., ., a ,, non nc - ™ Additional Information: full citation , abstract . 
Full text available: TO pdf (720.25 KB ) — rz ~ 

references , index terms 

This paper describes a formative evaluation of an integrated multilingual, 
multimedia information system, a series of user studies designed to 
guide system development. The system includes automatic speech 
recognition for English, Chinese, and Arabic, ... 

Keywords: cross-language information retrieval, multimedia, user 
studies 




Adaptive infor mation extraction 

Jordi Turmo, Alicia Ageno, Neus Catala 

July 2006 ACM Computing Surveys (CSUR) # volume 38 issue 2 
Publisher: ACM 

Additional Information: full citation , abstract . 



Full text available: T» pdf(986.35 KB) f . , t 

" references , index terms 

The growing availability of online textual sources and the potential 
number of applications of knowledge acquisition from textual data has 
lead to an increase in Information Extraction (IE) research. Some 
examples of these applications are the generation ... 

Keywords: Information extraction, machine learning 



10 An automatic si g n reco g nition and translation system 
^ Jie Yang, Jiang Gao, Ying Zhang, Xilin Chen, Alex Waibel 

November 2001 pui '01: Proceedings of the 2001 workshop on Perceptive 

user interfaces 




http://portal.acm.or^ 1/18/2008 



Results (page 1): search documents and ocr and translate keywords 



Page 4 of 6 



Publisher: ACM 

Full text available: gpdf(1.21 MB) Additional Information: full citation , abstract , references 

A sign is something that suggests the presence of a fact, condition, or 
quality. Signs are everywhere in our lives. They make our lives easier 
when we are familiar with them. But sometimes they pose problems. For 
example, a tourist might not be able ... 

Keywords: perceptive user interface, sign detection, sign translation, 
vision-based interface 



11 Ef ficient web br owsin g on handheld devices usin g pag e and form 
^ summariz ation 

January 2002 ACM Transactions on Information Systems (TOIS), 

Volume 20 Issue 1 
Publisher: ACM 

c IU . . , 0 , » a o\ Additional Information: full citation , abstract, referen ces. 
Full text available: to pdf(4.47_MB) — ~ 

^ cited by. index terms , review 

We present a design and implementation for displaying and manipulating 
HTML pages on small handheld devices such as personal digital 
assistants (PDAs), or cellular phones. We introduce methods for 
summarizing parts of Web pages and HTML forms. Each Web ... 

Keywords: PDA, Personal digital assistant, WAP, WML, forms, handheld 
computers, mobile computing, summarization, ubiquitous computing, 
wireless computing 




12 Machine learnin g i n automated text cate g orization 

Fabrizio Sebastiani 

March 2002 ACM Computing Surveys (CSUR), volume 34 issue l 
Publisher: ACM 

Additional Information: full citation , abstrac t, 
Full text available: ^p_df(524.41 KB) references, cited b y, index 

terms 

- The automated categorization (or classification) of texts into predefined 
categories has witnessed a booming interest in the last 10 years, due to 
the increased availability of documents in digital form and the ensuing 
need to organize them. In the research ... 

Keywords: Machine learning, text categorization, text classification 



* 

13 Towards automatic si g n translation 

Jie Yang, Jiang Gao, Ying Zhang, Alex Waibel 

March 2001 HLT '01: Proceedings of the first international conference on 

Human language technology research 
Publisher: Association for Computational Linguistics 

r- .. ^ -i to «t ^f/ono m Additional Information: full citation , abstract , 

Full text available: Tnpdf (208.10 KB) — — T 

" references , cited b y 

Signs are everywhere in our lives. They make our lives easier when we 
are familiar with them. But sometimes they also pose problems. For 
example, a tourist might not be able to understand signs in a foreign 
country. In this paper, we present our efforts ... 

Keywords: sign, sign detection, sign recognition, sign translation 



http://portal.acm.org/resultsxfm?coll=ACM&dl=ACM&CT 1/18/2008 



Results (page 1): search documents and ocr and translate keywords 



Page 5 of 6 




14 Commercial applic ations of natural lan guage processin g 

^ Kenneth W. Church, Lisa F. Rau 

November 1995 Communications of the ACM, volume 38 issue n 
Publisher: ACM 

Additional Information: full citation , abstract. 
Full text available: Q.pdJ(3J4,22J<B) ref erences , cited b y, index 

terms 

Vast quantities of text are becoming available in electronic form, ranging 
from published documents (e.g., electronic dictionaries, encyclopedias, 
libraries and archives for information retrieval services), to private 
databases (e.g., marketing information, ... 



15 T he use of topic evolution to help users browse and find answe rs in 

n ews v ideo corpus 
V Shi-Yong Neo, Yuanyuan Ran, Hai-Kiat Goh, Yantao Zheng, Tat-Seng Chua, 
Jintao Li 

September 2007 MULTIMEDIA '07: Proceedings of the 15th international 

conference on Multimedia 

Publisher: ACM 

c „ , * -i ui 0 ,r /C o C cc Additional Information: full citation , abstract . 
Full text available:™ pdf( 626.56 KB) — . . x 

^ references , index terms 

Earlier research in news video has been focusing mainly on improving 
retrieval accuracies given the limited amount of extractable video 
semantics. In this paper, we propose an enhancement to news video 
searching by leveraging extractable video semantics ... 

Keywords: event evolution, video analysis, video question answering 



1 6 Tpwa rd stoa n auto matic semantic annotation for multimedia learni ng 
<g> objects 

^ Stephan Repp, Serge Linckels, Christoph Meinel 

September 2007 Emme "07: Proceedings of the international workshop on 

Educational multimedia and multimedia education 

Publisher: ACM 

Additional Information: full citation , abstrac t. 



Full text available: TO pdf( 38 1.14KB ) f 3 , 

a references, index te rms 

The number of digital video recordings has increased dramatically. The 
idea of recording lectures, speeches, and other academic events is not 
new. But, the accessibility and traceability of its content for further use 
is rather limited. Searching multimedia ... 

Keywords: multimedia knowledge base, multimedia retrieval, speech 




17 Efficient Web form entry on PDAs 

^ Oliver Kaljuvee, Orkut Buyukkokten, Hector Garcia-Molina, Andreas 
Paepcke 

April 2001 WWW '01: Proceedings of the 10th international conference on 

World Wide Web 
Publisher: ACM 

r- .. . 4 . en ,, /ono ft A Additional Information: full citation , references, sited 

Full text available: TO pdf( 398.94 KB ) __ - 

^ by, index terms 



http://portal.acm.org/resultsx 1/18/2008 



Results (page 1): search documents and ocr and translate keywords Page 6 of 6 

Keywords: PDA, WAP, forms, mobile computing, wireless access 



18 P icturePiper: usin g a re-configurable pi peline to find images on the 
<g> Web 

^ Adam M. Fass, Eric A. Bier, Eyton Adar 

November 2000 UIST '00: Proceedings of the 13th annual ACM symposium 

on User interface software and technology 

Publisher: ACM 

i u, « oi t/p\ Additional Information: ful t citatio n, references, index 

Full text available: TO pdf(364.3i KB) - 

— terms 

Keywords: WWW searching, dataflow, image retrieval, pipeline 



19 A retrospective look at Greenstone: lessons from the first decade 



y£v Ian H. Witten, David Bainbridge 
June 2007 JCDL '07: Proceedings of 



mgs of the 2007 conference on Digital libraries 
Publisher: ACM 

Additional Information: full citation , abstract . 



Full text available: tn p df( 572.61 KB) . , 

^ references , index terms 

The Greenstone Digital Library Software has helped spread the practical 
impact of digital library technology throughout the world, with particular 
emphasis on developing countries. As Greenstone enters its second 
decade, this article takes a retrospective ... 

Keywords: architecture, greenstone, internationalization 



20 Adapting han dwriting reco g nition for a p plications in al g ebra lear ning 
^ Lisa Anthony, Jie Yang, Kenneth R. Koedinger 

~" September 2007 Emme '07: Proceedings of the international workshop on 

Educational multimedia and multimedia education 

Publisher: ACM 

Additional Information: full citation , abstract. 



Full text available: TO pdf( 308.71 KB ) 

i£=s ' references , ind ex terms 

In this paper we report the progress of our ongoing project exploring the 
adaptation of handwriting recognition-based interfaces for applications in 
intelligent tutoring systems for students learning algebra equation- 
solving. The research is motivated ... 

Keywords: handwriting input, handwriting recognition, mathematics 
learning, recognition accuracy evaluation 



Results 1 - 20 of 23 Result page: 1 2 next >_> 

The ACM Portal is published by the Association for Computing Machinery. Copyright © 2008 ACM, Inc. 

Terms of Usa ge Privacy Polic y Code of Ethics Contact U s 



Useful downloads: *B Adobe Acrobat QuickTime \iM Windows Media Pla yer Real Play er 



http://portal.acm.org/results.cfm?coll=ACM&dl=ACM&CFID=50523991&CFTOKEN=l... 1/18/2008 



Results (page 1): input documents and ocr and output XML 



Page 1 of 7 



a PORTAL 



Subscribe (Full Service) Register (Limited Service, Free) Login 
Search: ® The ACM Digital Library O The Guide 



USPTO 



input documents and ocr and output XML 




I* Feedback 



input documents and ocr and output XML 
Terms used: input documents ocr output XML 



Found 38 of 238,048 



Sort results | relevance vf ^ Save results to a Binder Refine^ these results with Advanced 

Di spiay I expanded form 7-1 □ Open results in a new Tr V this search in The ACM Guide 

results I — — 1 



window 



Results 1 - 20 of 38 



Result page: 1 2 next >_> 




1 The lifecycle of a digital historical document: structure and content 

4iy A. Antonacopoulos, D. Karatzas, H. Krawczyk, B. Wiszniewski 
** October 2004 DocEng '04: Proceedings of the 2004 ACM symposium on 

Document engineering 
Publisher: ACM 

Additional Information: full citation , abstract . 
Full text available: ^pdf( 313.24 KB) references , cited b y, index 

terms , review 

This paper describes the lifecycle of a digital historical document, from 
template-based structure definition through to content extraction from 
the scanned pages and its final reconstitution as an electronic document 
(combining content and semantic ... 

Keywords: digital libraries, document analysis, document architecture, 
document engineering, historical documents, text enhancement 



Dig ital document life cycle development 

Henryk Krawczyk, Bogdan Wiszniewski 

September 2003 ISICT '03: Proceedings of the 1st international symposium 

on Information and communication technologies 
Publisher: Trinity College Dublin 

Additional Information: full citation , abstract , 

references 

The paper reports on the ongoing project IST-2001-33441-MEMORIAL 
funded by EU under the Framework 5 Programme, aimed at developing 
an interactive electronic document model suitable for creating a Virtual 
Memorial Web portal. Documents are based on personal ... 



Full text available: fg| pdf ( 198.83 KB) 



3 Document di g itization lifecycle for complex ma g azine collection 

> Sherif Yacoub, John Burns, Paolo Faraboschi, Daniel Ortega, Jose Abad 
^ Peiro, Vinay Saxena 

November 2005 DocEng '05: Proceedings of the 2005 ACM symposium on 

Document engineering 

Publisher: ACM 

Additional Information: full citation , abstract . 
Full text available: *g)pdf( 54Q.79 KB ) references , cited b y. index 

terms 



Ads by Google 



Market Track. LLC 

The leader in retail 
ad tracking, 
analysis and digital 
ad achieves 

www.markettrack.com 



Image Analysis 
Software 

Easy to use image 
analysis software 
for microscopy. 

www.clemex.com 



Document 
Scannin g Service 
Free Online Quote. 
Scan to PDF/TIF 
Serving the DC 
Metropolitan Area 

www.ignitedscanning.com 



Image Analysis 
Techniques 
Unique Software 
Solutions That 
Work Affordable & 
Custom Made. Buy 
Now! 

www.SmartimTech.com/ln 



http://portal.acm.org/results.cfm?coll=ACM&dl=ACM&CFID=50523991&CFTOKEN=l... 1/18/2008 



Results (page 1): input documents and ocr and output XML Page 2 of 7 

The conversion of large collections of documents from paper to digital 
formats that are suitable for electronic archival is a complex multi-phase 
process. The creation of good quality images from paper documents is 
just one phase. To extract relevant ... 

Keywords: document analysis and understanding, document 
digitization, document engineering, preservation of historical content 




Content publishin g framework for interactive p a per documents 

Moira C. Norrie, Alexios Palinginis, Beat Signer 

November 2005 DocEng '05: Proceedings of the 2005 ACM symposium on 

Document engineering 

Publisher: ACM 

i- •!« ^ i wi ffi ,jf/T->i on L^m Additional Information: full citation , abstract , 
Full text available: TO pdf( 721.89 KB ) — — — 

" references , index terms 

Paper persists as an important medium for documents and this has 
motivated the development of new technologies for interactive paper 
that enable actions on paper to be linked to digital actions. A major issue 
that remains is how to integrate these technologies ... 

Keywords: interactive paper, publishing framework 




Extracting mathemat ical e x pressions from p ostscript docu ments 
Michael Yang, Richard Fateman 

July 2004 ISSAC '04: Proceedings of the 2004 international symposium on 

Symbolic and algebraic computation 
Publisher: ACM 

r? ■■ < , kl fii ^x/.o-7 izox Additional Information: full citation , abstract , 
Full text available: TSu pdf d 37.43 KB ) — . . . 

^ references , index terms 

Full-text indexing of documents containing mathematics cannot be 
considered a complete success unless the mathematics symbolism is 
extracted and represented in a standardized form permitting both 
searching for formulas, and re-use of this information ... 

Keywords: digital library, document image analysis, mathematics, 
optical character recognition, postscript documents 



6 Visual si g nature based identification of Low-resolution document 

A images 

^ Ardhendu Behera, Denis Lalanne, Rolf Ingold 

October 2004 DocEng '04: Proceedings of the 2004 ACM symposium on 

Document engineering 
Publisher: ACM 

.... . •, u, « ^,/o nn mo\ Additional Information: full citation , abstract , references , 
Full text available: Tfl pdf( 2.Q0 MB ) 

^ cited b y, index terms 

In this paper, we present (a) a method for identifying documents 
captured from low-resolution devices such as web-cams, digital cameras 
or mobile phones and (b) a technique for extracting their textual content 
without performing OCR. The first method ... 

Keywords: document visual signature, document-based meeting 
retrieval, documents' content extraction, low-resolution document image 
identification 



http://portal.acm.org/results.cfm?coll=ACM&dl=ACM&CFID=5052399 1/18/2008 



Results (page 1): input documents and ocr and output XML 



Page 3 of 7 



7 Buildin g a test collection for complex document information 
processin g 

^ D. Lewis, G. Agam, S. Argamon, O. Frieder, D. Grossman, J. Heard 

August 2006 SIGIR '06: Proceedings of the 29th annual international ACM 

SIGIR conference on Research and development in information 

retrieval 
Publisher: ACM 

Additional Information: f ull citation , ab stract . 
Full text available: ^.pdf(294jJ3 _KB) references, cited by, index 

terms 

Research and development of information access technology for scanned 
paper documents has been hampered by the lack of public test 
collections of realistic scope and complexity. As part of a project to 
create a prototype system for search and mining ... 

Keywords: TREC, corpora, metadata, queries, relevance judgments 




8 Enhancin g com posite digital documents usin g XML-based standoff 
marku p 

Peter L. Thomas, David F. Brailsford 

November 2005 DocEng '05: Proceedings of the 2005 ACM symposium on 

Document engineering 

Publisher: ACM 

_ . . .. , . a ,, /c?nf - oe Additional Information: full citation , abstract . 
Full text available: TO pdf( 695. 86 KB ) . 

re f erenc e$ t index terms 

Document representations can rapidly become unwieldy if they try to 
encapsulate all possible document properties, ranging from abstract 
structure to detailed rendering and layout. We present a composite 
document approach wherein an XML-based document ... 

Keywords: MathML, MusicXML, PDF, XBL, XML, composite documents, 
standoff markup 



9 Towards smarter documents 




Vikas Krishna, Prasad M. Deshpande, Savitha Srinivasan 

November 2004 CIKM '04: Proceedings of the thirteenth ACM international 



conference on Information and knowledge management 

Publisher: ACM 

Additional Information: full citation , abstract , 
Full text available: «g| pdf(224.7Q KB) references , cited b y. index 

terms 

Document analysis research typically focuses on document image 
understanding or classic problems in text classification, clustering, 
summarization and discovery. While that is an important aspect of 
document management, in practice, documents lifecycles ... 

Keywords: classification, content, processes, workflow 



10 Structuring documen ts accordin g to their table of contents 

y Herve Dejean, Jean-Luc Meunier 
^ November 2005 DocEng '05: Proceedings of the 2005 ACM symposium on 

Document engineering 

Publisher: ACM 

Full text, available: «g[ pdf( 544.22 KB ) Additional Information: fuN citation, abstract, 



http://portal.acm.org/res^ 1/18/2008 



* 



Results (page 1): input documents and ocr and output XML 



Page 4 of 7 



references , index terms 

In this paper, we present a method for structuring a document according 
to the information present in its Table of Contents. The detection of the 
ToC as well as the determination of the parts it refers to in the document 
body rely on a series of generic ... 

Keywords: document structuring, table of contents recognition 



11 Log ical document conversion: combinin g functional and formal 
knowled ge 

Herve Dejean, Jean-Luc Meunier 

August 2007 DocEng '07: Proceedings of the 2007 ACM symposium on 

Document engineering 
Publisher: ACM 

^ii ^ i u. fiii , f/ on7 nc i/nv Additional Information: full citation , abstract , 
Full text available: Tu pdf (297.96 KB ) 

references , index terms 

We present in this paper a method for document layout analysis based 
on identifying the function of document elements (what they do). This 
approach is orthogonal and complementary to the traditional view based 
on the form of document elements (how they ... 

Keywords: combination of knowledge, feedback, functional analysis, 
logical document analysis, methodology 



12 Robust document ima g e understandin g technolo g ies , 

Henry S. Baird, Daniel Lopresti, Brian D. Davison, William M. Pottenger 
November 2004 HDP '04: Proceedings of the 1st ACM workshop on 

Hardcopy document processing 

Publisher: ACM 

Additional Information: full citation , abstract . 
Full text available: ■g) pdf(92.22 KB) referenc es, cited by. index 

terms 

No existing document image understanding technology, whether 
experimental or commercially available, can guarantee high accuracy 
across the full range of documents of interest to industrial and 
government agency users. Ideally, users should be able to ... 

Keywords: OCR error management, document analysis, information 
retrieval 



13 Su pervised lear ning for the le ga cy document conversion 
yfcv Boris Chidlovskii, Jerome Fuselier 

^ October 2004 DocEng '04: Proceedings of the 2004 ACM symposium on 

Document engineering 
Publisher: ACM 

— || »| i | AttA on 7« l/d\ Additional Information: MLcitation, abstract, 

Full text available: ©BSffilflQJim references , index terms 

We consider the problem of document conversion from the rendering- 
oriented HTML markup into a semantic-oriented XML annotation defined 
by user-specific DTDs or XML Schema descriptions. We represent both 
source and target documents as rooted ordered trees ... 

Keywords: XML markup, legacy document conversion, machine 
learning 



http://portal.acm.org/results.cfm?coll=ACM&dl=ACM&CFID= ... 1/18/2008 



Results (page 1): input documents and ocr and output XML 



Page 5 of 7 




14 Extracting relevant named entities for automated expense 
reimbursement 

Guangyu Zhu, Timothy J. Bethea, Vikas Krishna 

August 2007 KDD '07: Proceedings of the 13th ACM SIGKDD international 

conference on Knowledge discovery and data mining 
Publisher: ACM 

c- „ . ui 0i a*,* ^ iv/idx Additional Information: full citation , abstract , references . 
Full text available: TO pdf( 1.16 MB ) — - 

index terms 

Expense reimbursement is a time-consuming and labor-intensive process 
across organizations. In this paper, we present a prototype expense 
reimbursement system that dramatically reduces the elapsed time and 
costs involved, by eliminating paper from the ... 

Keywords: conditional random fields, document layout analysis, 
learning, named entity extraction 




15 HeinOnline 

Richard J. Marisa 

January 2001 JCDL '01: Proceedings of the 1st ACM/IEEE-CS joint 

conference on Digital libraries 
Publisher: ACM 

en* i ui x5m*i im.7 no i_-m Additional Information: full citation , abstract , 

Full text available: TO pdf(21 7.08 KB ) — . . 4 

^ references , index terms 

HeinOnline is a new online archive of law journals. Development of He 
inOnline began in late 1997 through the cooperation of Cornell 
Information Technologies, William S. Hein & Co., Inc. of Buffalo, NY, and 
the Cornell Law Library. Built upon the familar ... 

Keywords: dienst, digital library, document structure, law journals, 
metadata, system design 



16 U pLib: a universal personal di g ital library system 

fa William C. Janssen, Kris Popat 

^ November 2003 DocEng '03: Proceedings of the 2003 ACM symposium on 

Document engineering 

Publisher: ACM 

Additional Information: full citation , abstract . 
Full text available: ■g[pdf( 261.28 KB) references , cited b y. index 

terms 

We describe the design and use of a personal digital library system, 
UpLib. The system consists of a full-text indexed repository accessed 
through an active agent via a Web interface. It is suitable for personal 
collections comprising tens of thousands ... 

m 

Keywords: document management, document repository, page image, 
personal digital library, thumbnail interfaces, web interfaces 

17 INFTY: an inte g rated OCR system for mathematical documents 
Masakazu Suzuki, Fumikazu Tamari, Ryoji Fukuda, Seiichi Uchida, Toshihiro 
Kanahori 

November 2003 DocEng '03: Proceedings of the 2003 ACM symposium on 

Document engineering 

http://portal.acm.org/results.cfm?coll=ACM&dl=ACM&CFID-50523991&CFTOKEN=l ... 1/1 8/2008 




Results (page 1): input documents and ocr and output XML 



Page 6 of 7 



Publisher: ACM 

Additional Information: full citation , abstract . 
Full text available: ^g,£df (322.41 KB ) references, cited by, index 

terms 

An integrated OCR system for mathematical documents, called INFTY, is 
presented. INFTY consists of four procedures, i.e., layout analysis, 
character recognition, structure analysis of mathematical expressions, 
and manual error correction. In those procedures, ... 

Keywords: character and symbol recognition, mathematical OCR, 
structure analysis of mathematical expressions 




18 The proc essin g of di gitiz ed works 

Jose Borbinha, Joao Gil, Gilberto Pedrosa, Joao Penas 
June 2006 JCDL '06: Proceedings of the 6th ACM/IEEE-CS joint conference 

on Digital libraries 
Publisher: ACM 

r— ii j a Ul a ,r/o -_i rL/ipx Additional Information: full citation , abstract , references. 

Full text available: TO pdf( 2.74 MB) — 

^ index terms 

This paper describes the processing of digitised works at the National 
Library of Portugal, as done in the scope of the National Digital Library 
initiate (BND). This comprises the normalization of the names of the 
images, the creation of technical metadata, ... 

Keywords: METS, OCR, best practices, digital publishing, digitization, 
image processing, structural metadata 




19 A s ystem for understandin g imag ed info gra phics and its a pplications 

Weihua Huang, Chew Lim Tan 

August 2007 DocEng '07: Proceedings of the 2007 ACM symposium on 

Document engineering 
Publisher: ACM 

Additional Information: f ull citation , a bstract , references, 



Full text available: TO pdf(1 .13_MB) 

" index terms 

Information graphics, or infographics, are visual representations of 
information, data or knowledge. Understanding of infographics in 
documents is a relatively new research problem, which becomes more 
challenging when infographics appear as raster images. ... 

* 

Keywords: applications, association of text and graphics, document 
image understanding, infographics 



20 Inj ectin g information into atomic units of text 




Yannis Haralambous, Gabor Bella 

November 2005 DocEng '05: Proceedings of the 2005 ACM symposium on 



Document engineering 

Publisher: ACM 

Additional Information: full citation , abstract . 
Full text available: l g|,pdf(244.01 KB) references , cited b y, index 

terms 

This paper presents a new approach to text processing, based on 
textemes. These are atomic text units generalising the concepts of 
character and glyph by merging them in a common data structure, 
together with an arbitrary number of user-defined properties. ... 



http://portal.acm.org/res^ 1/18/2008 



kesults (page 1): input documents and ocr and output XML Page 7 of 7 

Keywords: OpenType, PDF, SVG, Unicode, character, glyph, 
multilingual typesetting, omega, texteme 

+ 

Results 1 - 20 of 38 Result page: 1 2 next >> 

The ACM Portal is published by the Association for Computing Machinery. Copyright © 2008 ACM, Inc. 

Terms of Usag e Privacy Polic y Code of Ethics Contact Us 

Useful downloads: HH Adobe Acrobat QuickTime B Windows Media Pla yer ^> Real Player 



http://portal.acm.org/results.cfm?coll=ACM&dl=ACM&CFID=50523991&CFTOKEN=l ... 1/1 8/2008 



V* 



oor and search documents - Google Patents Page 1 of 2 

Web Images Maps News Shopping Gmail more t Sign in 

GOOQIC"" locr and search documents ' Ts^ch Patents | ggggggag 

Patent Search %3 3E"A I 

The "AND" operator is unnecessary - we include ail search terms by default, [ details ] 

Patents Patents 1 - 10 on ocr and search documents. (0.02 seconds) 

Sort by relevance | Sort bv date (new first) | Sort by date (old first) 

Information processing method and 
apparatus, and storage medium storing ... 

US Pat. 6310971 - Filed Jun 28, 1996 - Canon Kabushiki Kaisha 

The characters likely to be improperly recognized by the OCR are stored in the 

... This processing can eliminate documents including search keys but not ... 



System and method for portable document indexing using n-gram word ... 

US Pat. 5706365 - Filed Apr 10, 1995 - Rebus Technology, Inc. 

However, a detailed index may provide the benefit of reduced search 5 times. ... 

OCR systems portability of indexed documents, are generally very sensitive ... 

System and method for searching electronic documents created with optical ... 

US Pat. 6480838 - Filed Jun 9, 2000 

Search engines sometimes fail to locate documents that have been created using 
scanners and OCR software. This is due to the existence of numerous errors in ... 

Non-literal textual search using fuzzy finite non-deterministic automata 

US Pat. 5606690 - Filed Oct 28, 1994 - Canon Inc. 

Scanning systems often utilize optical character recognition (OCR) that converts 
text portions of scanned images into electronic data. Stored documents thus ... 

Word grouping accuracy value generation 

US Pat. 6269188 - Filed Mar 12, 1998 - Canon Kabushiki Kaisha 

For example, OCR 1 may recognize handwriting particularly accurately. ... to index 

the documents or provide search terms for searching within the document. ... 



Non-literal textual search using fuzzy finite-state linear non-deterministic ... 

US Pat. 6018735 - Filed Aug 22, 1997 - Canon Kabushiki Kaisha 

Scanning systems often utilize optical character recognition (OCR) that converts 

text portions of scanned images into electronic data. Stored documents thus ... 

Mass document stora g e and retrieval system 

US Pat. 5109439 - Filed Jun 12, 1990 

Search in many OCR conversion systems. Furthermore, such systems have no provision 
... In addition, rather rapid processing of documents and storage of the ... 

Image matching and retrieval by multi-access redundant hashing 

US Pat. 5465353 - Filed Apr 1, 1994 - Ricoh Company, Ltd. 

One query that is of interest in the above example of a document database is a 

search for documents with a given passage of text. ... 



System for searching a corpus of document images by user specified document ... 

US Pat. 5999664 - Filed Nov 14, 1997 - Xerox Corporation 

Many document search and retrieval systems rely entirely on the results of applying 
OCR (Optical Character Recognition) to every scanned document image. ... 

Method and apparatus for providing automated searching and linking of ... 



http://www.google.com/patents?q=ocr+and+search+documents 



1/18/2008 



ocir and search documents - Google Patents 



Page 2 of 2 



US Pat. 6138129 - Filed Dec 16, 1997 - World One Telecom, Ltd. 

using optical character recognition (OCR) to create an electronic file(s) of the 

one or more documents; (2) editing/formatting the OCR 30 documents; ... 

Goooooooooog I e ► 

Result Page: 1 2345678910 Next 



|ocr and s earch documents | ^^^^^PBH^- - « j"j 

Google Patent Search Help | Advanced Patent Search 

Goo g le Home - About Google - About Google Patent Search 

©2007 Google 



http://www.google.com/patents?q=ocr+and+search+documents 



1/18/2008 



oer and search documents and translate keyword - Google Patents Page 1 of 1 

Web Images Maps News Shopping Gmail more * Sign in 





^ |ocr and search documents and translate keyw \ Search Patents | Gooq^paten^s^i 1 

Patent Search se~a I 

The M AND" operator is unnecessary -- we include all search terms by default, [ details ] 

Patents Patents 1 - 1 on ocr and search documents and translate keyword. (0.02 seconds) 

Sort by relevance | Sort by date (new first) | Sort by date (old first) 

Ima g ed document optical correlation 
and conversion system 

US Pat. 6741743 - Filed Jul 31, 1998 - PRC. Inc. 

To search for specific ordinary skill will be able to affect various changes, 

... of the required to successfully implement the OC and OCR aspects „ ™th the ... 



ocr and search docum ents a nd translate J<ey\j j Search Patents | 
Google Patent Search Help | Advanced Patent Search 



Goo g le Home - About Google - About Google Patent Search 

©2007 Google 



http://www. google xom/patents?q=ocr+and+search+documents+and+translate-i-keyword 1/18/2008 



oer and search documents and translate - Google Patents Page 1 of 2 

Web Images Maps News Shopping Gmail more ▼ Sign in 





e 



Advanced Patent Seat 
Google Patent Search 



|ocr a nd search documents and translate j | Search Patents 

Patent Search CJ? BETA I 

The "AND" operator is unnecessary - we include all search terms by default, [ details ] 

Patents Patents 1 - 7 on ocr and search documents and translate. (0.02 seconds) 

Sort by relevance | Sort by date (new first) | Sort by date (old first) 

Imaged document optical correlation 
and conversion system 

US Pat. 6741743 - Filed Jul 31, 1998 - PRC. Inc. 

To search for specific ordinary skill will be able to affect various changes, 

... of the required to successfully implement the OC and OCR aspects „ ™th the ... 

Document-based query data for information retrieval 

US Pat. 6396951 - Filed Dec 23, 1998 - Xerox Corporation 

The significant image units are then decoded by optical character recognition (OCR) 
techniques, and the 45 decoded words can then be used to access ... 

Automatic language identification using both N-gram and word information 

US Pat. 6167369 - Filed Dec 23, 1998 - Xerox Company 

The search engine could, for 60 example, automatically identify the language of 

a query, translate the query into one or more other languages, ... 

Apparatus for translation of character codes for application to a data ... 

US Pat. 4425626 - Filed Nov 29, 1979 - Honeywell Information Systems Inc. 

The character codes read from the documents include a number of special characters 

... read heads may read OCR (Optical Character Recognition) characters. ... 

Indexing with translation model for feature regularization 

US Pat. 6925436 - Filed Jan 28, 2000 - International Business Machines Corporation 
There, the statistical translation model was trained to translate from speech 
... handwriting indexing, OCR indexing) broadens this domain further. ... 

System for converting medical information into representative abbreviated ... 

US Pat. 5809476 - Filed Mar 23, 1995 

... documents in which case Optical Character Recognition (OCR) could be used. 
Alternatively a voice recognition system may be used to directly translate ... 

Systems and methods for adaptive handwriting reco g nition 

US Pat. 7184591 - Filed May 21, 2003 - Microsoft Corporation 

Thus, optical character recognition (OCR) technology was developed to utilize. 

... the OCR software to translate a scanned image 65 into editable text. ... 



locr and search ^cuments and translate __j I SearchPatents | 



Google Patent Search Help | Advanced Patent Search 



Goo g le Home - About Google - About Google Patent Search 



» 

http://www. google .com/patents?q=ocr+and+search+documents+and+translate 1/18/2008 



ocr and search documents and translate - Google Patents 

©2007 Google 



http://www.googlexom/patents?q=ocr+and+search+documents+and+translate 



ocr and search documents and output xml - Google Patents Page 1 of 1 

Web Images Maps News Shopping Gmail more * Sign in 



Patent Search BETA I 



O 05£^^ ocr and search docume nts a nd output xml j Search Patents 



Advanced Patent Seai 
Google Patent Search 



The "AND" operator is unnecessary - we include all search terms by default, [ details ) 

Patents Patents 1 - 2 on ocr and search documents and output xml. (0.02 seconds) 

Sort by relevance | Sort by date (new first) | Sort by date (old first) 

User-defined search template for 
extracting information from documents 

US Pat. 6353840 - Filed Aug 13, 1998 - Ricoh Company, Ltd. 

search template. The documents are either stored in a storage device such as the 

... An optical character recognition unit (OCR) 207 converts alphanumeric ... 

System and method for automatic preparation of data repositories from ... 

US Pat. 6810136 - Filed Oct 18, 2002 - Olive Software Inc. 

These time-based definitions typically increase the speed and efficiency of 

operation of a search engine through the output format data, for example. ... 



|ocr and search documents and ou tput x ml : j Search Patents 
Google Patent Search Help | Advanced Patent Search 



Goo g le Home - About Google - About Google Patent Search 

©2007 Google 



http://www.google.com/patem 1/18/2008 



